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I. REAL PARTY TN INTEREST 

The above referenced application is wholly assigned to International Business Machines 
Corporation ("IBM**), A New York corporation having a principle place of business at Armonk, 
New York. 

n. RELATED APPEALS AND INTERFERENCES 

There are no related appeals or interferences known to Appellant that will directly affect, 
be directly affected by, or have a bearing on the Board's decision in this appeal. 

Ill STATUS OF CLAIMS 

Claims 1, 2, 4-8, 15, and 18-21 are pending in this application. All pending claims stand 
rejected under the Final Action. More particularly: 

Claims 1, 2, 4-6, 15, and 18-21 rejected under 35 USC § 103(a) as being unpatentable 
over Wilcox*, U.S. Patent No. 5,199,077, (hereinafter Wilcox) in view of Boman et al. f U.S. 
Patent No. 6,480,819, (hereinafter Boman) in further view of Lee, U.S. Patent No. 6,067,520, 
(hereinafter Lee), 

Claims 7 and 8 were rejected under 35 USC § 103(a), as being unpatentable over Wilcox 
in view of Boman et aU and Lee, and further in view of well known prior ait. 

The rejections of all pending claims are appealed herein. The rejection of independent 
claims 1 and 15 and dependent claims 19 and 20 are argued specifically. 

XV. STATUS OF AMENDMENTS 

No amendments have been filed subsequent to the final rejection. 

V. SUMMARY OF CLAIMED SUBJECT MATTER 

Independent claim 1 defines a system for locating a desired audio segment within a 
storage device. The system includes an input device (e.g., keyboard 104) and a media player 
(108) [Page 4, lines 1 1-14]. The input device (1 04) transmits input sample text [page 4, lines 14- 



PAGE 5114 * RCVD AT 7/1 112005 4:37:25 PM [Eastern Daylight Time] * $VR:U$PTO-EFXRM/0 * DNIS:8729306 1 CSID:5124289871 * DURATION (mm-ss]:05-00 



07/li>2005 15:36 LftLLY & LALLY LLP -> USPTO CENTRAL 



NO. 084 1706 



Commissioner for Patents 
Appeal Brief 
Page 4 of 12 



Seriah 09/498234 
Art Unit: 2654 
Examiner: A. Armstrong 
Docket No. AUS990879US1 



17] that is indicative of the audio segment [page 2, lines 9-11], The media player (108) plays 
audio content stored on the storage device (e.g., disk 109) [page 4, lines 17-19]. The system has 
a sample converter (104) that generates an input sample diphthong sequence (105) upon 
receiving the input sample text (103) from the input device [page 5, lines 4-6], The input sample 
diphthong sequence (105) is a digital representation of the diphthong components of the input 
sample [page 5, lines 10-17]. An audio converter (122) generates an audio content diphthong 
sequence (125) comprising a digital representation of the diphthong components of the audio 
content of the storage device (page 7, lines 19-22). A comparator (130) detects a match between 
the input sample diphthong sequence (105) and a portion of the audio content diphthong 
sequence (125) [page 7, line 25 to page 8, line 2]. 

Independent claim 15 defines a computer program product for locating an audio segment 
in a storage device that includes computer executable instructions [see page 9, lines 25 to 28] 
including first converter means (104) for generating a first diphthong sequence (105) upon 
receiving input sample text (103) where the first diphthong sequence (105) is indicative of the 
input sample text The product includes second converter means (122) for generating a second 
diphthong sequence (125) from audio information stored on the storage device 109). The 
product includes comparator means (130) for locating a portion of the second diphthong 
sequence (125) that matches the first diphthong sequence (105) according to a specified set of 
match criteria. 

The elements of independent claim 15 may be construed as means-plus-function elements 
in which case the structure or acts described in the specification corresponding to the first 
converter means include sample converter (104) [page 5, lines 4-9] and step (144) [page 10, lines 
17-20]. The structure or acts corresponding to the second converter means include audio 
converter (122) [page 7, lines 13-23], step 146 [page 10, lines 21-24]. The structure or acts 
corresponding to the comparator means include string comparator 130 [page 7, lines 24 through 
page 8, line 6] and steps 148, 152, and 154 [page 10, line 23 through page 1 1 > line 5]. 
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VI GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

The rejection of claims 1, 2, 4-6, 15, and 18-21 under 35 USC § 103(a) as unpatentable 
over Wilcox (U.S. Patent No. 5,199,077) in view of Boman et al. (U.S. Patent No. 6,480,819), in 
further view of Lee (US. Patent No. 6,067,520). 

VIL ARGUMENT 

Section 103(a) rejection of Claims 1. 2. 4-6 , 15, and 18-21 as unpatentable ov^r Wilcox, L^e, ftnd 



1. No Motivation to Modify Wilcox to Incorporate Mono-svllabic Features of Lee 

The Final Action fails to establish a prima facie case of obviousness under Section 103(a) 
because there is no motivation to modify the cited references to arrive at the claimed 
combination. Specifically, there is no motivation to modify Wilcox to incorporate the mono- 
syllabic features of Lec. A prima facie case of obviousness under Section 103(a) requires some 
suggestion or motivation, either in the references themselves or in the knowledge generally 
available to one of ordinary skill in the art, to modify the reference or to combine reference 
teachings and a reasonable expectation of success. MPEP 2131. Moreover, the teaching or 
suggestion to make the claimed combination and the reasonable expectation of success must both 
be found in the prior art, not in applicant's disclosure. MPEP 2131 (citing In re Vaeck* 947 F.2d 
488, 20 USPQ2d 1438 (Fed. Cir. 1991)). 

There is no motivation and reasonable expectation of success to modify Wilcox to 
incorporate the mono-syllabic teachings of Lee because Wilcox explicitly teaches away from the 
mono-syllabic approach and indicates a lower expectation of success for a mono-syllabic 
implementation. 

The claims under consideration recite the generation of diphthong sequences in an 
invention for locating specified audio segments on disk or other storage. Diphthongs are 
explicitly defined in the specification as mono-syllabic speech sounds [page 5, line 5]. The Final 



Boman 



Claims 1 and 15 
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Action correctly acknowledges that Wilcox does not teach the use or generation of diphthongs. 
The Final Action supports the Section 103(a) of the claims under consideration relying on 
limitations taught in Lee. Lee teaches the processing of monosyllables in a system for converting 
Mandarin speech into Chinese characters. The Final Action states that it would have been 
obvious to one skilled in the art to modify Wilcox for one reason, namely, "to implement 
monosyllables as the acoustic units of recognition as taught by Lee, for the purpose of improving 
recognition results ." 

Directly contradicting the Final Action's stated basis for combining the references, 
Wilcox states that its system "works better for multi-syllable words than for single syllable 
words." (Wilcox, Column 4, lines 31-33). Wilcox further supports this statement while 
discussing FIG. 14 depicting experimental results indicating that the probability of a false 
detection in its word spotting algorithm increases from 1% per sentence to nearly 10% per 
sentence for a monosyllabic word (column 10, lines 44-48). Wilcox concludes from these results 
that "better keyword detection and lower false alarm rates are obtained using keyword with more 
syllables" and suggests that users use "phrases rather than single words for editing and indexing 
applications." 

As stated in MPEP 2143.01, "[t]he test for obviousness is what the combined teachings of 
the references would have suggested to one of ordinary skill in the art... Where the teachings of 
two or more prior art references conflict, the examiner must weigh the power of each reference to 
suggest solutions to one of ordinary skill in the art, considering the degree to which one reference 
might accurately discredit another." Citing In re Young, 927 F.2d 588, 18 USPQ2d 1089 (Fed. 
Cir. 1991). In this case, the cited references would not have suggested to one of ordinaiy skill in 
the art a diphthong-based implementation of Wilcox. One of ordinary skill in the art would 
recognize Lee as an application specific solution in which the nature of Mandarin speech and its 
relationship to Chinese characters motivated Lee to use a mono-syllabic approach. See, e.g., Lee, 
column 4, lines 20-26. One of ordinary skill would not, however, have been motivated to modify 
Wilcox to employ monosyllable acoustic units in the face of Wilcox's explicit statements and 
experimental results indicating that the false detection rate increases for monosyllables. 

Because the cited references would not have motivated one of ordinary skill in the field to 
modify Wilcox to incorporate monosyllable acoustic unite, Appellant submits that the Final 
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Action does not establish a prima facie case of obviousness. Accordingly, Appellant respectfully 
requests the Board to reverse the Section 103(a) rejection of the claims under consideration and 
remand the matter to the Examiner for further prosecution. 



The Final Action fails to establish a prima facie case of obviousness under Section 103(a) 
because the cited references do not teach or suggest all of the claimed limitations. Specifically, 
the claims under consideration recite "an input device for transmitting input sample text" (claim 
1) and "responsive to receiving input sample text" (claim 15). The cited references do not teach 
the use of text as the input to an audio detection application. A prima facie case of obviousness 
under Section 103(a) requires that the references must teach or suggest all the claim limitations. 

The Final Action correctly acknowledges that Wilcox does not teach that the input is a 
text sample. Supporting the obviousness rejection of the claims under consideration, the Final 
Action states that "Boman teaches an automatic search of audio channels by matching spoken 
words against closed-caption audio content, which converts spoken input into text for searching," 
Appellant has emphasized the words "spoken input" used by the Final Action in describing the 
teachings of Boman because these words clearly differentiate Boman's input (spoken input) from 
the input to the claimed invention's (text input). Boman is summarized concisely in the first 
three lines of the patent: "The present invention relates generally to interactive television and 
more particularly, to a system that allows the user to select channels bv spoken request ." While 
Boman does refer to text when it describes functionality "to detect closed caption text or audio 
channel speech that matches the user's previously spoken r equest" (Boman, column 2, lines 3-5), 
this passage and others like it always refer to the invention's input as spoken input. Boman is 
motivated by the desire to enable a television viewer, who may have 200 channels or more from 
which to select, a user friendly way to select a desired channel. As such, Boman explicitly 
teaches a speech input system to fulfill this desire and does not either teach or suggest text input 
as a suitable alternative. Boman states that existing on screen programming guides, which are 
activated through remote control keypad entries analogous to text input, are not adequate for a 
system having a large number of channels. 



2. References Fail to Teach All Claimed limitations 



PAGE 9/14 ' RCVD AT 7/11/2005 4:37:25 PM [Eastern Daylight Time] * SVR:USPTO-ff XRF-1/0 1 DNIS;8729306 * CSID:5124289871 ' DURATION (mm-ss):05-00 



07/11/2005 15:36 LALLY 8. LALLY LLP -> USPTO CENTRAL NO. 084 010 

Commissioner for Patents Serial: 09/498234 

Appeal Brief Art Unit: 2654 

Page 8 of 12 Examiner; A. Armstrong 

Docket No. AUS990879USI 



Because the cited references neither teach nor suggest the claimed limitation of text input, 
the Final Action fails to establish a prima facie case of obviousness. Accordingly, Appellant 
respectfully requests the Board to reverse the obviousness rejection of the claims under 
consideration and remand the Application to the Examiner for further prosecution. 

3. No Motivation to Modify Wilcox to Incorporate Text Input 

Assuming for the sake of argument that Boman does teach or suggest the use of text 
input, the Final Action still fails to establish a prima facie case of obviousness because there is no 
motivation or suggestion to modify Wilcox to incorporate text input. As stated above, a prima 
facie case of obviousness requires the existence of a motivation or suggestion to combine 
references to arrive at the claimed combination. 

There is no motivation to modify Wilcox to incorporate text input because the features of 
Wilcox are uniquely applicable to a speech recognition system. The Wilcox abstract indicates 
that Wilcox allows a speaker to specify keywords dynamically and to train the system via a single 
repetition of a keyword. Non-keyword speech is modeled using prerecorded samples for 
continuous speech. The application is intended for interactive applications such as the editing of 
voice mail or mixed media documents and for keyword indexing in single-speaker or audio or 
video recordings. (See, Wilcox Abstract). In summary, Wilcox is unambiguously a speech 
recognition system and, more particularly, a word spotting system in which speaker utterances 
are modeled as vectors that describe sound. Moreover, Wilcox is a speaker-dependent 
application in which differences between the speech characteristics of different users are 
accounted for in the speech models. Thus, for example, Wilcox lists as its first objective, a 
system for spotting a keyword spoken by a talker in previously recorded speech by the same 
talker . (Wilcox, column 2, lines 14-16). 

The Final Action states that it would have been obvious to modify Wilcox to allow for 
textual input for the purpose of providing access to a user unable to vocalize a request. Appellant 
respectfully disagrees. All of the significant features of Wilcox are directed at modeling and 
detecting speaker-dependent characteristics which are not applicable for a user who cannot 
vocaJi2e a request. Referring to FIG 2 element 20 of Wilcox, for example, the first step in the 
Wilcox wordspotting method is to analyze previous utterances of the same talker to create the 
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initial and background speech models for the speaker. This step is a necessary requirement to 
take advantage of Wilcox's speaker-dependent features. Because a user who cannot vocalize a 
request cannot provide a sample of his or her previous utterances, Wilcox is effectively 
inoperable for users unable to vocalize requests. 

Because the cited references would not have motivated one of ordinary skill in the field to 
modify Wilcox to incorporate text input, Appellant submits that the Final Action does not 
establish a prima facie case of obviousness. Accordingly. Appellant respectfully requests the 
Board to reverse the Section 103(a) rejection of the claims under consideration and remand the 
matter to the Examiner for further prosecution. 



Claims 19 

The Final Action fails to establish a prima facie case of obviousness under Section 
103(a). Claim 19 recites a limitation in which the first and second diphthongs are compared 
using an "exact" matching criteria (as distinguished in the specification from a fuzzy criteria). 
Although the Final Action rejected claim 19, Appellant is unable to find any reference to the 
claim or the claim limitations in the Final Action. Because the burden of establishing the prima 
facie case of obviousness lies with the Examiner, Appellant submits that the Final Action does 
not establish a prima facie case of obviousness for claim 19. 



Claim 20 

The Final Action fails to establish a prima facie case of obviousness under Section 
103(a). Claim 20 recites a limitation in which the first and second diphthongs are compared 
using a "fuzzy" matching criteria (as distinguished in the specification from an exact criteria). 
Although the Final Action rejected claim 20, Appellant is unable to find any reference to the 
claim or the claim limitations in the Final Action. Because the burden of establishing the prima 
facie case of obviousness lies with the Examiner, Appellant submits that the Final Action does 
not establish a prima facie case of obviousness for claim 20. 
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CONCLUSION 



In view of the foregoing, Applicant submits that the pending claims are allowable over 
the cited references and would respectfully request the Board to reverse the pending rejections 
and remand the application to the Examiner for reconsideration consistent with an order that the 
Examiner allow this case unless a proper rejection of the claims can be made. 
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Vin. CLAIMS APPENDIX 

TEXT OF CLAIMS PRESENTED ON APPEAL: 

1 (previously presented). A system for locating an audio segment within a storage device, 
comprising: 

an input device suitable for transmitting input sample text indicative of the audio 



a media player suitable for playing audio content stored on the storage device; 

a sample converter configured lo generate an input sample diphthong sequence in 
response to receiving the input sample text from the input device, wherein the input 
sample diphthong sequence comprises a digital representation of the diphthong 
components of the input sample; 

an audio converter configured to generate an audio content diphthong sequence 
comprising a digital representation of the diphthong components of the audio content of 
the storage device; and 

a comparator configured to detect a match between the input sample diphthong sequence 
and a portion of the audio content diphthong sequence. 



2 (previously presented). The system of claim l t wherein the input device comprises a keyboard. 

3 (canceled). 

4 (original). The system of claim 1, wherein the input device comprises the media player and the 
input sample comprises information recorded on a storage media. 

5 (original). The system of claim 1, wherein the comparator is further configured to produce a 
signal indicative of the location within the storage device of the matching portion of the audio 
content diphthong sequence. 

6 (original). The system of claim 5, further comprising a media player configured to receive the 
location signal from the comparator and to advance the storage device to the location indicated 
by the location signal. 

7 (original), The system of claim 1, wherein the storage medium comprises a compact disc. 

8 (original). The system of claim 1, wherein the storage medium comprises a digital video disc. 



segment; 
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9-14 (canceled). 

15 (previously presented). A computer program product for locating an audio segment in a 
storage device, the computer program product comprising a computer readable medium 
configured with processor executable instructions, comprising: 

first converter means for generating a first diphthong sequence responsive to receiving 
input sample text, wherein the first diphthong sequence is indicative of the input sample 
text; 

second converter means for generating a second diphthong sequence from audio 
information stored on the storage device; and 

comparator means for locating a portion of the second diphthong sequence, wherein the 
located portion of the second diphthong sequence and the first diphthong sequence match 
according to a specified set of match criteria. 

16 (canceled). 

17 (canceled). 

18 (original). The computer program product of claim 15, wherein the comparator means 
includes means for indicating the location within the storage device of the audio information 
corresponding to the second diphthong sequence. 

19 (original). The computer program product of claim 15, wherein the match criteria require 
exact match between the first and second diphthong sequence. 

20 (original). The computer program product of claim 15, wherein the match criteria are fuzzy 
criteria, 7 

21 (original). The computer program product of claim 15, wherein the computer readable 
medium comprises a storage medium is one of a floppy diskette, hard disk, CD ROM, or 
magnetic tape. 



PAGE 14/14 * RCVD AT 7/1 1/20O5 4:37:25 PM [Eastern Daylight Time] * SVR:USPTO-EFXRF-1/0 * DNlS:8729306 * CS1D:5124289«71 * DURATION (mm-ss):05-00 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 

BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ CO^OR OR BLACK AND WHITE PHOTOGRAPHS 

□ yGRAY SCALE DOCUMENTS 

ftp LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



